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Summary of Review 

The Education Trust research report Stuck Schools suggests a framework for identify- 
ing chronically low-performing schools in need of turnaround. The study uses Maryland 
and Indiana to show that some low-performing schools make progress while others re- 
main stagnant. The report has four serious problems of reliability and validity, however. 
First, the nonn-referenced methodology guarantees “failed” schools independent of any 
true performance or improvement level by the school. Second, the report’s reliance on 
state assessment data is misleading, and some schools’ reported growth may be an artifact 
of regression to the mean and ceiling effects as well as instructional and testing practices. 
Third, the use of a linear growth model is questionable, since schools may not follow a 
strictly linear pattern of improvement. Fourth, the label of “stuck” becomes problematic 
given that there is no research-based guidance on how to improve schools other than va- 
gue prescriptions. In conclusion, the report’s methods are so simplistic, arbitrary and ill- 
fitting with its own assumptions that it is more harmful to sound policymaking than help- 
ful. There remains an outstanding question of how to help struggling schools after identi- 
fication, but we need to know first whether the identification is based on reliable and va- 
lid measures, and if so, what school factors account for these differences. 




Review 



I. Introduction 

Stuck Schools, a research report from the 
Education Trust, written by Natasha Usho- 
mirsky and Daria Hall, is aimed at providing 
a framework for identifying chronically low- 
performing schools that are arguably in need 
of school-turnaround interventions. 1 The 
study selects two states, Maryland and Indi- 
ana, as showcase examples to demonstrate 
how to use the currently available state as- 
sessment database to identify such schools. 

The authors’ chosen classification is based 
on two variables, performance (status) and 
improvement (growth). First, they sort 
schools into three categories — high- 

performing, average-performing and low- 
performing — based on “status”: the baseline 
status of achievement, or how well students 
perform, on average, over the first three 
years of the five-year period under study. 
(Slightly different study periods are used for 
the two states.) The report then classifies the 
same schools into another three categories 
based on “growth”: how much the schools 
improve their proficiency rates over the five- 
year study period. 

Next, the report cross-classifies schools ac- 



cording to these two dimensions and ex- 
amines how many schools are simultaneous- 
ly low-performing and low-improving. 
Schools that fall within cell D in Figure 1 
are designated “stuck” schools. The study 
also defines and identifies “chronically low- 
performing schools” as those where perfor- 
mance in the final three years of the study 
period fell consistently below the bottom 5% 
bar; several “stuck” schools are classified as 
chronically low-performing as well. 

This study’s focus on the schools that are 
both low-performing and low-improving is 
best understood in the context of a recent 
policy paradigm shift in the American 
school-accountability system, from a narrow 
focus school performance to a dual focus on 
both performance and improvement. The 
Education Trust study is designed to inform 
the Obama Administration’s new school ac- 
countability policies, which concentrate on 
state interventions in chronically low- 
performing schools that do not show signs 
of improvement. 

As the report acknowledges, its analysis 
does not fully align with the current accoun- 
tability and school identification policies in 
place through NCLB. While NCLB relies pri- 
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Figure 1. Classification of schools by the level of performance (status) and 
improvement (growth) 
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marily on the status model of school perfor- 
mance evaluation, this report builds upon the 
model of combined status and growth. It is sim- 
ilar in intent to the turnaround provisions of the 
administration’s Race to the Top program. 

Moreover, while NCLB takes a criterion- 
referenced (standards-based) approach, with 
predetermined performance targets and 
timelines for all schools, the study takes a 
nonn-referenced approach (i.e., a compari- 
son of relative performance rankings) to 
identify schools needing improvement. 

II. Findings and Conclusions 
of the Report 

The study examines trends in overall per- 
formance and improvement over time. Spe- 
cifically, it raises the following four related 
research questions: (1) What did perfor- 
mance look like several years ago? (2) How 
big were the annual gains at high-improving 
schools? (3) How about at low-improving 
ones? (4) Among the lowest-performing 
schools, how many remained stuck, how 
many made extraordinary gains, and how 
many fell somewhere in between? 

For the first question, the study finds 
enormous variations among schools in base- 
line performance. The baseline (2005-07) 
reading proficiency rate in Maryland (using 
the state’s own accountability ratings) is 
79% on average, but it ranges from 27% to 
99% among Maryland’s 1,066 schools serv- 
ing any combination of grades 3-8. The 
baseline (2004-06) reading proficiency rate 
in Indiana is 73% on average. But it ranges 
from 26% to 96% among Indiana’s 1,477 
schools serving any combination of grades 
3-8. The study then identifies the bottom 
25% of schools in both states. In those low- 
est-performing schools, about 58% of stu- 
dents meet the proficiency standard in Mary- 
land and 57% in Indiana. 
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For the second and third questions, the study 
reports different patterns of school academic 
improvement in the two states. In Maryland, 
the study shows that schools at all different le- 
vels of performance generally made progress, 
with low-perfonning schools making the big- 
gest gains. In Indiana, the study shows that the 
average rate of reading proficiency remained 
stagnant from 2004 to 2008, while there are 
few variations in gains between high- 
perfonning and low-perfonning schools. 

For the fourth question, the study reports 
fewer stuck or chronically low-performing 
schools in Maryland (4% of the state’s ele- 
mentary and middle schools) than in Indiana 
(15% of the state’s elementary and middle 
schools). For Maryland, there were 44 stuck 
or chronically low-performing schools in 
total, with 22 of those identified for reading 
and 3 1 for math (9 schools fell into both cat- 
egories). For Indiana, there were 228 stuck 
or chronically low-performing schools, 155 
identified for reading and 147 for math (74 
schools fell into both categories). The key 
finding of this report is that among initially 
low-perfonning schools, “some schools are 
improving; others are stuck” (p. 1). Further, 
the authors emphasize that some schools 
persistently produced worse results than 
95% of schools in their states, even as they 
managed to make some gains. In conclusion, 
they recommend differentiated approaches: 
benchmarking of practices from low- 
perfonning schools that made significant 
progress, and targeted support and interven- 
tions for low-perfonning schools that have 
made little, no, or negative improvement. 

III. Rationales Supporting 

the Findings and Conclusions 

The study grows out of the new federal poli- 
cy movement and the urgent need for empir- 
ical research for policy guidance. The au- 
thors note: 
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In recent months, the federal gov- 
ernment has put billions of dollars 
on the table with a demand for real 
action in turning around our coun- 
try’s lowest performing schools. At 
the same time, federal and state 
leaders are considering future di- 
rections for education policy. In 
this context, understanding recent 
patterns of school improvement is 
particularly important (p. 1). 

At the bottom line, the new policy supports 
the rationale of the study to separate two dif- 
ferent kinds of low-performing schools, (1) 
schools that are chronically low-performing 
without any indication of major improve- 
ment (cell D in Figure 1), and (2) schools 
that are low-performing initially at the base- 
line but show great improvement over the 
course of five years (cell C in Fig 1). 

The study is based on the premise that there 
are good schools and bad schools in terms of 
academic perfonnance and improvement, 
and that we can and must identify the bad 
schools for the sake of children and the so- 
ciety. In lieu of “bad,” the report uses terms 
such as “stuck.” 

There is, of course, an underlying logic to 
such categorizations, but in order to identify 
those stuck schools and turn them around, 
the measures and methods used for identifi- 
cation must be valid, reliable, and fair. Un- 
fortunately, the study does not address any 
of the key psychometric and statistical issues 
that may threaten the validity of its findings 
and conclusions. This decision may be un- 
derstood in light of the report’s target au- 
dience of the policy community rather than 
the research community. However, this se- 
rious omission of important scientific and 
technical issues can undennine the very ra- 
tionale and purpose of the study. This report 
offers policy guidance without engagement 
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in important methodological issues. The 
most important problems concerning validi- 
ty and reliability are discussed below, in the 
review of research methods and findings. 

IV. The Report’s Use of 
Research Literature 

This report is identified as the first of a four- 
part “Stuck Schools Series” intended to 
“provide educators, policymakers, and the 
public with a framework for using data to 
identify schools and districts that are making 
academic progress or that desperately need 
help” (p. 1). Notwithstanding this goal, the 
report’s conceptual and analytical frame- 
work does not build on any established 
theory or prior research, and it fails to capi- 
talize on recent advances in value-added 

2 

growth model experiments in several states. 
The study’s idea of differentiating schools 
by two separate dimensions (perfonnance 
and improvement) is not new, but it doesn’t 
leam from earlier efforts. The article does 
not provide any references to the extensive 
prior research on this topic. The current 
(NCLB-linked) school accountability sys- 
tems in most states rely heavily on the indi- 
cators of schools’ academic status rather 
than their progress, although they may com- 
bine the two pieces of information for a final 
decision. Previous studies have found that 
the relationship between the status and 
progress of school achievement is generally 
tenuous. The Education Trust report shows 
this same pattern; among low-performing 
schools, the authors observe both fast and 
slow rates of improvement. By using lessons 
from earlier research, the report’s authors 
could have examined whether such differen- 
tiation of school improvement levels is reli- 
able and valid. 

The scope and depth of data analysis in the 
report is highly limited. This study is selec- 
tive in the sense that it focuses attention on 
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that one particular category of schools that 
are low-performing and low-improving at 
the same time. The study further attempts to 
refine that category by identifying higher- 
risk schools that are not only stuck but also 
chronically low-performing. However, as 
discussed below this further classification is 
highly arbitrary, and the report’s operational 
definitions are not based on research litera- 
ture. What, for instance, is the rationale for 
targeting the bottom 25% or the bottom 5%? 
Finally, although the report attempts to dis- 
criminate between “stuck” schools and 
“chronically low-performing” schools, the 
underlying basis for finding schools in both 
low perfonnance and low improvement is 
similar. 

V. Review of the Report’s 
Methods 

The report’s designation of some schools as 
“stuck” suggests that the schools them- 
selves — rather than the structural and re- 
source issues within which those schools 
carry on — should be blamed and “turned 
around.” That is the authors’ clear position, 
which would explain why they eschewed a 
more neutral term such as “struggling” 
schools. No matter the term, however, the 
key issue is how the study identifies such 
schools. Although the report never makes its 
methods explicit, the description that is pro- 
vided in the appendix shows that it uses a 
linear regression method to estimate the 
slope of regression (i.e., annual growth rate) 
and used that estimate of a regression coef- 
ficient to classify schools into three levels of 
improvement. 

This approach immediately raises questions. 
What if schools had showed curvilinear pat- 
tern of growth rather than linear pattern of 
growth? The assumption of linear growth 
means that schools make an equal increment 
of proficiency gains every year (e.g., 3 point 
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gain per year over the five-year period = 3 
times 5 = 15 -point gain total). The assump- 
tion may be reasonable given the limited 
number of years available for tracking 
school performance trends, and it actually 
may fit most cases. However, the imposition 
of this particular growth model across all 
schools has the risk of misestimating growth 
rates and dismissing other possible patterns 
of growth. 

The report also does not consider (or simply 
does not report) the reliability of estimating 
growth rate through a time-series regression 
method. To explore this issue, I conducted a 
re-analysis of the same Maryland data and 
generated regression coefficients with stan- 
dard errors and indicators of statistical signi- 
ficance. The exercise reveals that many of 
the schools classified as high-improving or 
low-improving by this study are not really 
showing a consistent “linear” pattern of im- 
provement. This calls into question the re- 
liability of the report’s measures of school 
aggregate perfonnance trend reliable. 

In Figure 2, I illustrate this issue with the 
same data from schools in Maryland 2005- 
09. 4 The Figure focuses on just one school 
in Maryland, showing tremendous instabili- 
ty. The school has a generally upward per- 
formance trend until 2008, followed by an 
unexpected large decline in 2009. The linear 
regression method (the approach that the 
Education Trust study used) would identify 
the slope of regression, giving an annual 
growth rate of -1.19. As shown by the direc- 
tion of fitted regression line in Figure 2, the 
schooFs perfonnance trend looks negative 
despite earlier positive gains. However, the 
standard error of the regression coefficient 
(information that the Education Trust study 
did not consider or report) is 4.45, and the 
growth rate is not statistically significant: 
the 95% confidence interval of this slope 
ranges from a low of -10 points to a high of 
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Figure 2. One Maryland school’s reading proficiency rate trend during 
2005-2009 (the line is estimated through a simple linear regression of the 
percent-proficient variable on the academic-year variable) 



+8 points. In other words, the linear regres- 
sion model does not fit this particular 
school’s data, and there is no systematic “li- 
near” pattern of growth that we can draw 
from these five years of data due to the out- 
lier (i.e., idiosyncratic test result in 2009). 
Despite the uncertainty of the growth pat- 
tern, this school would be classified as a 
low-improving school by the method used in 
the study. As one would expect, part of the 
inconsistency seems to be related to school 
size; the smaller a school, the more inconsis- 
tent or unstable its average proficiency over 
time. There were only about 95 students in 
the Figure 2 school who took the test, and 
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this uncertainty is likely to worsen if we 
break down the school by subgroups. 

In the report, after a growth rate has been 
identified, the authors classified schools into 
quartiles, with a focus on the bottom quar- 
tile. Their use of quartiles as a reference 
point of classification is justifiable by statis- 
tical analysis convention, but the rationale 
for their choice is not explained to their pol- 
icy audience. In order to decide how much 
academic growth is good enough, the study 
chose to use a norm-referenced classifica- 
tion scheme as opposed to a criterion- 
referenced classification. A major problem 
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with this approach is that it does not recog- 
nize broad improvement. Schools or stu- 
dents are pitted against one another, and no 
matter how much positive growth they 
make, some of them will, by definition, still 
be below the norm. Accordingly, their 
progress will not be recognized. In contrast, 
a criterion-referenced approach involves set- 
ting desired standards for growth based on 
externally determined criteria such as curri- 
cular-based or age- or grade-based expecta- 
tions for student performance (e.g., value- 
added growth models adopted by states like 
North Carolina and Tennessee). This re- 
quires setting perfonnance standard for each 
grade and connecting them across grades. 
The approach used in the report also raises 
an unanswered question as to how much 
growth is sufficient to warrant proficiency in 
the future, and how soon. 

VI. Review of the Validity 
of the Findings and 
Conclusions 

The authors argue that 



progress differs drastically among 
the states. For example, on average, 
reading and math proficiency rates 
in Maryland have improved sub- 
stantially in recent years, yet in 
other states — Indiana, for exam- 
ple — average performance has re- 
mained flat. (p. 2) 



Although this statement can be true in gen- 
eral, simple interstate comparisons of 
achievement trends based on their own state 
assessment results can be misleading. As 
noted in the report, the two states’ perfor- 
mances on NAEP as well as their own state 
tests are very similar; this indicates that the 
rigor of their proficiency standard is compa- 
rable. However, the state-assessment trends 
in Indiana and Maryland diverge, despite 
common flat trends on NAEP (see Figure 3). 
Given the flat NAEP trends in both states, it 
appears that Maryland’s state test score 
gains might be an artifact of something not 
related to authentic educational progress. 
From a psychometric perspective, these are 
extraneous factors not transferrable to an 




Grade 4 Reading IN state test 



Grade 4 Reading IN NAEP test 



Grade 4 Reading MD state 
test 

Grade 4 Reading MD NAEP 
test 



Figure 3. Grade 4 Reading Proficiency Rate Trends on NAEP versus State 
Assessments in Indiana and Maryland 
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independent, low-stakes NAEP test (e.g., 
narrowing the curriculum and teaching to 
the test). Although the report does not give 
any clues about the specific practices in 
these particular states, prior research sug- 
gests that this kind of evidence raises a ques- 
tion about the validity of using state test 
score results as a sole measure of academic 
progress under NCLB. 5 

The report shows an improvement gap be- 
tween high-improving and low-improving 
schools of about 4-6 percentage points. For 
Indiana, the gap is from 2.2 to -1.9 percen- 
tage points; for Maryland, it’s from 0.5 to 
5.6 percentage points. But the report did not 
explore possible causes for the variability of 
improvement levels. It does show that high- 
improving schools have significantly more 
minority and low-income students than low- 
improving schools in Maryland, whereas the 
student demographics are very similar be- 
tween the two groups of schools in Indiana. 
But the reader is left to ponder why this 
might be. 

In fact, the reason why Maryland’s high- 
minority and low-income schools showed a 
greater degree of improvement is most likely 
a statistical artifact known as regression to 
the mean. 6 My re-analysis of the same data 
from Maryland confirms that pattern; the 
correlation between initial status and growth 
rate is -0.72 in reading and in math. 

Beyond this likely regression phenomenon, 
there is no information in the report that can 
help differentiate high-improving and low- 
improving schools. Unless we understand 
the school mechanism (such as school input, 
context, or process variables) that facilitates 
or constrains the differential pattern of 
growth, a simple presentation of demograph- 
ic differences can be misleading. 7 While 
school-related effects may vary from state to 
state, it is worth investigating those factors 
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that contribute to value-added academic 
growth beyond the effects of student and 
family background characteristics. 

VII. Usefulness of the Report 
for Guidance of Policy 
and Practice 

The overall conceptual framework of the re- 
port helps bring more attention to the issues 
of validating and using school-level perfor- 
mance trend data for accountability. The re- 
port’s idea of measuring and recognizing 
growth in addition to status is part of a gener- 
al improvement to the current exclusive and 
narrow focus on a year-by-year snapshot 
model of school evaluation under NCLB. 
However, the report’s methods are so sim- 
plistic, arbitrary and poorly fitting to the re- 
port’s own assumptions that it is more harm- 
ful to sound policymaking than helpful. 

The report’s norm-referenced model guaran- 
tees failed schools independent of their true 
performance and improvement levels. There 
will always be winners and losers when the 
calculation is based on comparisons of 
schools’ relative performance or improve- 
ment against percentile ranks rather than ab- 
solute benchmarks. In fact, this purely norm- 
referenced approach may pose potential con- 
flicts in the real policy world, since it goes 
against the spirit of setting common stan- 
dards for all. We need further research and 
policy discussion with regard to setting de- 
sirable and feasible goals of school perfor- 
mance and improvement targets. 

There remain outstanding questions about 
the validity and reliability of the measures 
and methods used by the study. The differ- 
ence shown in Figure 3 above, between the 
improvement patterns based on national ver- 
sus state assessment results, suggests that 
the report’s sole reliance on state assessment 
data can be misleading. Further, many 
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schools do not follow a strictly linear pattern 
of improvement (i.e., same incremental 
gains each year), and thus the report’s impo- 
sition of a linear growth model on all 
schools is questionable. This is more prob- 
lematic in small schools where the school- 
level aggregate perfonnance patterns are not 
highly consistent and stable over time. Since 
the Education Trust plans to report the anal- 
ysis of school subgroup performance as part 
of this series of publications, it should se- 
riously consider the reality that it becomes 
more challenging to reliably measure growth 
for student subgroups in small schools. 

The utility of the current report is also li- 



mited since it did not examine the school 
characteristics associated with differences 
between low-improving versus high- 
improving schools that had low initial per- 
formance status. Consequently, the authors 
are vague about what specific strategies — 
such as benchmarking, funding, reconstitu- 
tion, and capacity-building — are more viable 
and effective options for identified schools. 
This question should, in fact, remain unans- 
wered until we know whether differences in 
growth rates are based on reliable and valid 
measures and if so, what school factors 
caused these differences. Using the frame- 
work shown in Figure 1, how can we help 
struggling schools move from cell D to cell 
C, and then ultimately to cell A? 
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